AITopics | pii type

Collaborating Authors

pii type

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Understanding Privacy Risks in Code Models Through Training Dynamics: A Causal Approach

Yang, Hua, Velasco, Alejandro, Fang, Sen, Xu, Bowen, Poshyvanyk, Denys

arXiv.org Artificial IntelligenceDec-10-2025

Large language models for code (LLM4Code) have greatly improved developer productivity but also raise privacy concerns due to their reliance on open-source repositories containing abundant personally identifiable information (PII). Prior work shows that commercial models can reproduce sensitive PII, yet existing studies largely treat PII as a single category and overlook the heterogeneous risks among different types. We investigate whether distinct PII types vary in their likelihood of being learned and leaked by LLM4Code, and whether this relationship is causal. Our methodology includes building a dataset with diverse PII types, fine-tuning representative models of different scales, computing training dynamics on real PII data, and formulating a structural causal model to estimate the causal effect of learnability on leakage. Results show that leakage risks differ substantially across PII types and correlate with their training dynamics: easy-to-learn instances such as IP addresses exhibit higher leakage, while harder types such as keys and passwords leak less frequently. Ambiguous types show mixed behaviors. This work provides the first causal evidence that leakage risks are type-dependent and offers guidance for developing type-aware and learnability-aware defenses for LLM4Code.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2512.07814

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SA-ADP: Sensitivity-Aware Adaptive Differential Privacy for Large Language Models

Etuk, Stella, Matrawy, Ashraf

arXiv.org Artificial IntelligenceDec-2-2025

Despite advances in the use of large language models (LLMs) in downstream tasks, their ability to memorize information has raised privacy concerns. Therefore, protecting personally identifiable information (PII) during LLM training remains a fundamental challenge. Conventional methods like Differential Privacy-Stochastic Gradient Descent (DP-SGD) provide robust privacy protection via uniform noising, protecting PII regardless of its distinct sensitivity. This comes at the expense of the model's utility, leading to a trade-off. In this paper, we propose SA-ADP, a sensitivity-aware approach that allocates noise based on the sensitivity of individual PII. We evaluated our method on four datasets (ABCD, CUSTOMERSIM, Wikitext-2, and UNSW-NB15 ). Our results show that SA-ADP achieves results comparable to the baseline (No-DP) and the conventional DP-SGD. This means that our method did not degrade the model's utility while still maintaining strong privacy protection.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2512.01748

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Scalable multilingual PII annotation for responsible AI in LLMs

Meena, Bharti, Skubisz, Joanna, Rajgarhia, Harshit, Dave, Nand, Ganesh, Kiran, Dalmia, Shivali, Mukherji, Abhishek, Sundarababu, Vasudevan

arXiv.org Artificial IntelligenceOct-13-2025

Abstract--As Large Language Models (LLMs) gain wider adoption, ensuring their reliable handling of Personally Identifiable Information (PII) across diverse regulatory contexts has become essential. This work introduces a scalable multilingual data curation framework designed for high-quality PII annotation across 13 underrepresented locales (Table I), covering approximately 336 locale-specific PII types. Our phased, human-in-the-loop annotation methodology combines linguistic expertise with rigorous quality assurance, leading to substantial improvements in recall and false positive rates from pilot, training, and production phases. Beyond reporting empirical gains, we highlight common annotator challenges in multilingual PII labeling and demonstrate how iterative, analytics-driven pipelines can enhance both annotation quality and downstream model reliability. I. Introduction A. PII Data Protection The surge in user-generated content has led to vast textual corpora containing hidden instances of Personally Identifiable Information (PII) in application forms, support tickets, reviews and social media posts [1]. PII--such as NAME, SSN, and PHONE NUMBER--poses significant privacy risks if not handled correctly. Its compromise can result in identity theft, financial fraud, and unauthorized access to sensitive data [2].

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.0625

Country:

Europe (1.00)
Asia (0.68)
North America > United States (0.46)

Genre: Research Report (0.40)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.90)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing

Hughes, Anthony, Duddu, Vasisht, Asokan, N., Aletras, Nikolaos, Ma, Ning

arXiv.org Artificial IntelligenceOct-10-2025

Language models (LMs) may memorize personally identifiable information (PII) from training data, enabling adversaries to extract it during inference. Existing defense mechanisms such as differential privacy (DP) reduce this leakage, but incur large drops in utility. Based on a comprehensive study using circuit discovery to identify the computational circuits responsible PII leakage in LMs, we hypothesize that specific PII leakage circuits in LMs should be responsible for this behavior. Therefore, we propose PATCH (Privacy-Aware Targeted Circuit PatcHing), a novel approach that first identifies and subsequently directly edits PII circuits to reduce leakage. PATCH achieves better privacy-utility trade-off than existing defenses, e.g., reducing recall of PII leakage from LMs by up to 65%. Finally, PATCH can be combined with DP to reduce recall of residual leakage of an LM to as low as 0.01%. Our analysis shows that PII leakage circuits persist even after the application of existing defense mechanisms. In contrast, PATCH can effectively mitigate their impact.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.07452

Country:

North America > United States (1.00)
Asia (1.00)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)

Add feedback

PII-Bench: Evaluating Query-Aware Privacy Protection Systems

Shen, Hao, Gu, Zhouhong, Hong, Haokai, Han, Weili

arXiv.org Artificial IntelligenceFeb-25-2025

The widespread adoption of Large Language Models (LLMs) has raised significant privacy concerns regarding the exposure of personally identifiable information (PII) in user prompts. To address this challenge, we propose a query-unrelated PII masking strategy and introduce PII-Bench, the first comprehensive evaluation framework for assessing privacy protection systems. PII-Bench comprises 2,842 test samples across 55 fine-grained PII categories, featuring diverse scenarios from single-subject descriptions to complex multi-party interactions. Each sample is carefully crafted with a user query, context description, and standard answer indicating query-relevant PII. Our empirical evaluation reveals that while current models perform adequately in basic PII detection, they show significant limitations in determining PII query relevance. Even state-of-the-art LLMs struggle with this task, particularly in handling complex multi-subject scenarios, indicating substantial room for improvement in achieving intelligent PII masking.

information, pii entity, pii type, (12 more...)

arXiv.org Artificial Intelligence

2502.18545

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Montserrat (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

End-to-End triplet loss based fine-tuning for network embedding in effective PII detection

Kohli, Rishika, Gupta, Shaifu, Gaur, Manoj Singh

arXiv.org Artificial IntelligenceFeb-13-2025

There are many approaches in mobile data ecosystem that inspect network traffic generated by applications running on user's device to detect personal data exfiltration from the user's device. State-of-the-art methods rely on features extracted from HTTP requests and in this context, machine learning involves training classifiers on these features and making predictions using labelled packet traces. However, most of these methods include external feature selection before model training. Deep learning, on the other hand, typically does not require such techniques, as it can autonomously learn and identify patterns in the data without external feature extraction or selection algorithms. In this article, we propose a novel deep learning based end-to-end learning framework for prediction of exposure of personally identifiable information (PII) in mobile packets. The framework employs a pre-trained large language model (LLM) and an autoencoder to generate embedding of network packets and then uses a triplet-loss based fine-tuning method to train the model, increasing detection effectiveness using two real-world datasets. We compare our proposed detection framework with other state-of-the-art works in detecting PII leaks from user's device.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.09002

Country: Europe > Austria (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback